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AppLNo. 09/703,140 

Aradt. dated March 10, 2004 

Response lo Office Aciiun of November 1 0, 2003 

Amendments to the Specification; 

Rewrite the paragraph at page 1, lines 8 to 10 as follows; 

This application claims priority under 35 USC §1 19(e)(1) of Provisional Application No. 
60/183,527, filed February 18, 2000 (TI 30302FS) and of Provisional Application No. 
60/183,654, filed February 18, 2000 (TI 26010FS), 

Rewrite the paragraph at page 6, line 16 to page 7, line 5 as follows: 

In microprocessor 1 there are shown a central processing unit (CPU) 10, data memory 22, 
program memory 23, peripherals 60 and an external memory interface (EMEF) with a direct 
memory access (DMA) 61. CPU 10 further has an instruction fetch/decode unit lOa-c, a 
plurality of execution units, including an arithmetic and load/store unit Dl, a multiplier Ml, an 
ALU/shifter unit SI, an arithmetic logic unit ("ALU'*) LI, a shared multi-port register file 20a 
from wbich data are read and to which data are written. Instructions are fetched by fetch unit 
10a from instruction memory 23 over a set of busses 41. Decoded instructions are provided from 
the instruction fetch/decode unit lOa-c to the functional units Dl, Ml, SI, and LI over various 
sets of control lines which are not shown. Data are provided to/from the register tile 20a from/to 
to load/store asite unit Dl over a first set of busses 32a, to multiplier Ml over a second set of 
busses 34a^ to ALU/shifter unit SI over a third set of busses 36a and to ALUjLt, over a fourth set 
of busses 38a. Data are provided to/from the memory 22 from/to the load/store usife unit Dl via 
a fifth set of busses 40a. Note that the entire data path described above is duplicated with 
register file 20b and execution units D2 ? M2 5 S2, and L2- Load/store unit P2 similarly interfaces 
with memory 22 via a set of busses 40b. In this embodiment of the present invention, two 
unrelated aligned double word (64 bits) load/store transfers can be made in parallel between CPU 
10 and data memory 22 on each clock cycle using bus set 40a and bus set 40b. 

Rewrite the paragraph at page 8, lines 3 to 12 as follows: 
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Response to Office Action of November 10, 2003 

When microprocessor 1 is incorporated in a data processing system^ additional memory 
or peripherals may be connected to microprocessor 1, as illustrated in Figure I . For example, 
Random Access Memory (RAM) 70, a Read Only Memory (ROM) 71 and a Disk 72 are shown 
connected via an external bus 73. Bus 73 is connected to the External Memory Interface (EMIF) 
which is part of functional block 61 wilhin microprocessor 1 . A Direct Memory Access (DMA) 
controller is also included within block 61, The DMA controller part of functional block 61 
connects to data memory 22 via bus 43 and is generally used to move data between memory and 
peripherals within microprocessor 1. *md memory and peripherals which are external to 
microprocessor 1 . 

Rewrite the paragraph at page 8, lines 16 to 22 as follows: 

A detailed description of various architectural features of the microprocessor of Figure 1 
is provided in coasstgned application- S.N. 09/012, 8 13 (TI 25311) U.S. Patent r 6. 182.203 and is 
incorporated herein by reference. A description of enhanced architectural features and an 
extended instruction set not described herein for CPU 10 is provided in coassigned U.S. Patent 
application S.N. — (TI 30302) 09/703.096 Microprocessor with Improved 
Instruction Set Architecture and is incorporated herein by reference. 

Rewrite the paragraph at page 8, line 23 to page 9 S line 2 as follows: 

Figure 2 is a block diagram of the execution units and register files of the microprocessor 
of Figure 1 and shows a more detailed view of the buses connecting the various functional 
blocks. In this figure, all data busses are 32 bits wide, unless otherwise noted. There are two 

general-purpose register files (A and B) in the processor's data paths. Each of these files contains 
32 32-bit registers (A0-A31 for register file A 20a and B0 B31 for register file B gOb). The 
general-purpose registers can be used for data, data address pointers, or conojitipn registers. Any 
number of reads of a given register can be performed in a given cycle. 

Rewrite the paragraph at page 1 1, line 3 to page 12, line 7 as follows: 
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Response to Office Action of November 1 0, 2003 

Most dam lines in the CPU support 32-bit operands, and some support long (40-bit) and 
double word (64-bit) operands. Each functional unit has its own 32-bit write port into a general- 
purpose register file (Refer to Figure 2). All units ending in 1 (for example, XI) write to register 
file A 20a and all units ending in 2 write to register file B 20b. Each functional unit has two 32- 
bit read ports for source operands srcl and src2. Four units (XI, X2, .SI, and .S2) have an extra 
8-bit-wide port for 40-bit long writes, as well as an 8-bit input for 40-bit long reads. Because 
each unit has its own 32-bit write port, when perfonning 32-bit operations sfll dight units can be 
used in parallel every cycle. Since each multiplier can return up to a 64-bit result, two write 
ports (dstl and dst2> are provided from the multipliers to the respective register file. 

Rewrite the paragraph at page 12, lines 10 to IS as follows: 

Each functional unit reads directly from and writes directly lu the register file within its 
own data path. That is, the XI unit 18a , .SI unit 16a . .Dl unit 12a, and .Ml units unit 14a write 
to register file A 20a and the X2 unit 18b, -S2 unit 16b, .D2 unit 12b, and .M2 writs rait 14b 
write to register file D 20b . The register files arc connected to the opposite-side register file's 
functional units via the IX and 2X cross paths. These cross paths allow functional units from 
one data path to access a 32-bit operand from the opposite side's register file. The IX cross path 
allows data path As functional units Lo read their source from register file B. Similarly, the 2X 
cross path allows data path B's functional units to read their source from regiiter file A. 

Rewrite the paragraph at page 1 2, lines 1 9 to 23 as follows: 

All eight of the functional units have access to the opposite side's register file via a cross 
path. The .Ml, .M2, .SI, .52, .Dli and .D2 units' src2 inputs arc selectable between the cross 
path and the same side register file. In the case of the XI and X2 both srcl and srcl inputs are 
also selectable between the cross path and the same-side register file. Cross path IX bus 210 
couples one input of multiplexer 211 for srcl input of XI unit 18a. multiplexer 212 for src2 
input of XI unit 18a, multiplexer 213 for src2 input of -SI unit 16a and multiplexer 214 for src2 
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input of .Ml unit 14a, Multiplexers 211. 212. 213, and 214 select between the cross Path IX bus 
210 and an output of register file A 20a. Buffer 250 buffers cross path ZX output to similar 
multiplexers for ,L2. .S2, .M2, and ,D2 units. 

Insert the following new paragraph at page 13, between lines 9 to 10: 

S2 unit 16b may write to control register file 102 from its dst output via bus 220- S2 unit 
16b may read from control register file 102 to its src2 input via bus 221. 

Rewrite the paragraph at page 13, line 24 to page 14, line 4 as follows: 

Bus 40a has an address bus DAI which is driven by mux 200a. This allows an address 
generated by either load/store unit Dl or D2 to provide a memory address for loads or stores for 
register file 20a. Data Bus LD1 loads data from an address in memory 22 specified by address 
bus DAI to a register in load unit Dl . Unit Dl may manipulate the data provided prior to storing 
it in register file 20a. Likewise, data bus ST1 stores data from register file 20a to memory 22. 
Load/store unit Dl performs the following operations; 32-bit add, subtract, linear and circular 
address calculations. Load/store unit D2 operates similarly to unit Dl via bus 40b> with the 
assistance of mux 200b for selecting an address. 

Rewrite the paragraph at page 14, lines 15 to 23 as follows: 

Tabic 3 defines the mapping between instructions and functional units fox a set of basic 
instructions included in a DSP described in U.S. Paten t S.N. 09/012, 8 13 (TI 25311, 6.182J203 
incorporated herein by reference). Table 4 defines a mapping between instructions and 
functional units for a set of extended instructions in an embodiment of the present invention. 
Alternative embodiments of the present invention may have different sets, of, instructions and 
functional unit mapping. Table 3 and Table 4 are illustrative and are not exhaustive or intended 
to limit various embodiments of the present invention. 
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Rewrite the paragraph at page 19, lines 12 to 18 as follows: 

Perfbnnanee can be inhibited by stalls from the memory system, stalls for cross path 
dependencies, or interrupts. The reasons for memory stalls are determined by the memory 
architecture. Cross path stalls are described in detail in U.S. Patent S.N. (TI - 30S63) 
09/702.453, to Steiss, et al and is incorporated herein by reference. To fully understand how to 
optimize a program for speed, the sequence of program fetch, data store, and data load requests 
the program makes, aud how Lbcy mighL stall the CPU should be understood^ , 

Rewrite the paragraph at page 27, line 25 as follows: 

The remaining steps 620, 630 and €4 640 are identical to Figure 6 A. , 

Rewrite the paragraph at page 28, lines 5 to 27 as follows: 

The .M unit has three major functional units: Galois multiply unit 700a-c, multiply unit 
710 and other non-multiply functional circuitry in block 720. Galois multiplier 700a-c and 
multiplier 710 require three additional cycles to complete the multiply operations, so multiply 
instructions are categorized as having three delay slots. Pipeline registers 730-733 hold partial 
rusulls between each pipeline execution phase- In general, multiply unit 710 can perform Lhe 
following operations on a pair of multipliers 71 la»b: two 16x16 multiplies oy fijur 8x8 multiplies 
with all combination of signed or unsigned numbers, Q-shifting and P-shifting of multiply 
results, rounding for multiply instructions, controlling the carry chain by breaking/joining the 
carry chain at 16-bit block boundaries, and saturation multiplication where the final result is 
shifted left by 1 or returns 0x7FFFFFFF if an overflow occurs. Galois, multiply unit 700 
performs Galois multiply in parallel with M multiply unit 710. The lower 32 bits (bits 31:0) of a 
result are selected by multiplexer 734 and are stored in the even register of a register pair. The 
upper 32 bits (bits 63:33) of the result are selected by multiplexer 735 and are stored in the odd 
register of the register pair. A more detailed description of configurable multiply circuitry is 
provided in co-assigned U.S. Patent application S.N, (TI 26010) 09/703.093 entitled 
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Data Processor With Flexible Multiply Unit and is incorporated herein by reference. Details of 
the Calois multiply unit are provided in co-assigned U.S. Patent application In* _ fQ- 

2tm) 09/507,187 to David Hoyle entitled Galois Field Multiply and is incorporated herein by 
reference. 
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